Principal component analysis for selection of optimal SNP-sets that capture intragenic genetic variation.
نویسندگان
چکیده
Candidate gene association studies often utilize one single nucleotide polymorphism (SNP) for analysis, with an initial report typically not being replicated by subsequent studies. The failure to replicate may result from incomplete or poor identification of disease-related variants or haplotypes, possibly due to naive SNP selection. A method for identification of linkage disequilibrium (LD) groups and selection of SNPs that capture sufficient intra-genic genetic diversity is described. We assume all SNPs with minor allele frequency above a pre-determined frequency have been identified. Principal component analysis (PCA) is applied to evaluate multivariate SNP correlations to infer groups of SNPs in LD (LD-groups) and to establish an optimal set of group-tagging SNPs (gtSNPs) that provide the most comprehensive coverage of intra-genic diversity while minimizing the resources necessary to perform an informative association analysis. This PCA method differs from haplotype block (HB) and haplotype-tagging SNP (htSNP) methods, in that an LD-group of SNPs need not be a contiguous DNA fragment. Results of the PCA method compared well with existing htSNP methods while also providing advantages over those methods, including an indication of the optimal number of SNPs needed. Further, evaluation of the method over multiple replicates of simulated data indicated PCA to be a robust method for SNP selection. Our findings suggest that PCA may be a powerful tool for establishing an optimal SNP set that maximizes the amount of genetic variation captured for a candidate gene using a minimal number of SNPs.
منابع مشابه
Applying Variable Deletion Strategies in Bankruptcy Studies to Capture Common Information and Increase Their Reality
In financial distress studies selection of variable is commonly basedon the success of variables in variable sets employed in earlierbankruptcy studies, suggestions in the literature or an accompanyingdata reduction in a large set of variables. If seemingly different variablesets exhibit a strong relationship then heterogeneous variable setscapture common information. Canonical correlation anal...
متن کاملبررسی تنوع ژنتیکی پیازهای بومی ایران
In order to study the genetic variation among local varieties of onion in Iran, an experiment was conducted in the Research Center, Faculty of Agriculture, Tabriz University. Sixteen populations were evaluated for agronomic characteristics and also total seed proteins via SDS-PAGE. Cluster analysis and principal component analysis were used to group the onion populations under study. Analysis o...
متن کاملبررسی تنوع ژنتیکی پیازهای بومی ایران
In order to study the genetic variation among local varieties of onion in Iran, an experiment was conducted in the Research Center, Faculty of Agriculture, Tabriz University. Sixteen populations were evaluated for agronomic characteristics and also total seed proteins via SDS-PAGE. Cluster analysis and principal component analysis were used to group the onion populations under study. Analysis o...
متن کاملFeature selection using genetic algorithm for classification of schizophrenia using fMRI data
In this paper we propose a new method for classification of subjects into schizophrenia and control groups using functional magnetic resonance imaging (fMRI) data. In the preprocessing step, the number of fMRI time points is reduced using principal component analysis (PCA). Then, independent component analysis (ICA) is used for further data analysis. It estimates independent components (ICs) of...
متن کاملApplication of Genetic Algorithms for Pixel Selection in MIA-QSAR Studies on Anti-HIV HEPT Analogues for New Design Derivatives
Quantitative structure-activity relationship (QSAR) analysis has been carried out with a series of 107 anti-HIV HEPT compounds with antiviral activity, which was performed by chemometrics methods. Bi-dimensional images were used to calculate some pixels and multivariate image analysis was applied to QSAR modelling of the anti-HIV potential of HEPT analogues by means of multivariate calibration,...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Genetic epidemiology
دوره 26 1 شماره
صفحات -
تاریخ انتشار 2004